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What is claimed is: 

1. A method for recovering from failures affecting a 
resource manager within a group of resource managers , 
wherein the resource managers within the group have access 
to a shared resource via which remote resource managers 
communicate with the resource managers witnin the group, 
the shared resource including data storag4 structures to 
which resource managers within said grow? connect to send 
and receive communications, the method/comprising: 

storing, within a first data storage structure of the 
shared resource, unit of work descriptors for operations 
performed in relation to said shaped resource by the 
resource managers in said group J 

sending a notification off a connection failure between 
a second data storage structure of the shared resource and 
a first resource manager within said group, the 
notification being sent to/ the remaining resource managers 
within the group which arie connected to the second data 
storage structure; / 

one or more of said remaining resource managers 
accessing said firs/ data storage structure and analysing 
the unit of work descriptors to identify the units of work 
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relating to the second data storage structure that were 
being performed by the first resource manager whenr the 
connection failure occurred; and / 

said one or more remaining resource managers 
recovering the identified units of work. / 

2. A method according to claim 1 wheraan, if there are no 
remaining resource managers connected yco the second data 
storage structure after said connect/on failure, said 
notification is sent to a remaining^ resource manager when 
that resource manager connects to/ the second data storage 
structure. / 

3. A method according to c/aim 1 wherein, if there are no 
remaining resource managers' connected to the second data 
storage structure after said connection failure, the failed 
resource manager determines when it is restarted whether 
any other resource manager has performed recovery for its 
units of work relating to the second data storage structure 
and, upon determining that no resource manager has 
performed said recovery, the restarted resource manager 
recovers said units of work. 

4. A methoci according to claim 1, wherein all remaining 
resource managers within the group which are connected to 



# 
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the second data storage structure respond to saad 
notification by attempting to access said fir/t data 
storage structure to identify units of work Jto recover, and 
the method includes the further steps of: / 

responsive to a first remaining resource manager 
identifying a unit of work to recover, /said first remaining 
resource manager attempting to set a i-lag for said unit of 
work; / 

responsive to successfuly se/ting said flag, assigning 
recovery responsibility for said/ unit of work to said first 
remaining resource manager; and 

refusing to assign recovery responsibility for said 
unit of work to said first /remaining resource manager if 
said flag has been set by /another remaining resource 
manager. / 

5. A method according to claim 4, including the further 
step of: / 



responsive tp said flag having been set by another 
remaining resource manager, said first remaining resource 
manager attempting to identify a further unit of work to 
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recover and attempting to set a flag for said identified 
further unit of work. / 

6. A method according to claim 4, including ther following 
steps in response to a connection failure between the 
second data storage structure of the shared resource and 
said first remaining resource manager during^ recovery of 
said unit of work: / 

sending a notification of said connection failure to 
the remaining resource managers within/ the group which are 
connected to the second data storage/structure; 

one or more of said remaining resource managers 
accessing said first data storage structure and analysing 
the unit of work descriptors to identify the units of work 
relating to the second data sstorage structure that were 
being performed by the first remaining resource manager 
when the connection failure occurred; and 

said one or more /remaining resource managers 
recovering the identified units of work 

7. A method according to claim 1, wherein the unit of 
work descriptors include: 
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a unit of work identifier; 

an identification cpf messages put or retrieved within 
the unit of work; / 

a status for the i.nit of work; and 

a sequence number. 

8. A method according to claim 1, wherein the shared 
resource is a coupling facility list structure, the second 
data storage structure is a coupling facility list 
structure in which a coupling facility list header 
represents a shared ajccess message queue, and the first 
data storage structure is an administration list structure 
of the coupling facility for storing unit of work 
descriptors. / 

9. A method according to claim 8, including storing 
within the coupling/ facility, for each resource manager 
within the group, af list header information map 
representing the set of shared access message queues within 
the second data storage structure for which the resource 
manager has performed some work. 
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10. A method according to claim 9, including reading/ said 
list header information map during recovery to identify the 
set of shared access message queues within the secpnd data 
storage structure for which the failed resource nrfanager has 
performed some work. / 

11. A method according to claim 1, including storing 
within the shared resource a structure inberest map 
identifying the set of data storage stru/tures to which 
respective resource managers within saLd group are 
connected. / 

12 . A method according to claim 11, wherein the step of 
recovering the identified units at work is a first recovery 
phase and wherein the method includes a second recovery 
phase comprising the steps of/ 

reading the structure /interest map for the failed 
resource manager to identify the set of data storage 
structures to which the /failed resource manager was 
connected at the time /ot said connection failure; 

identifying spy operations performed by the failed 
resource manager/on said set of data storage structures 
which were not /recovered in the first recovery phase; and 
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one or more of said remaining resource managers /then 
backing out said unrecovered operations. / 

13. A method according to claim 12, wherein the/ method 
includes setting a key for operations performe/d in relation 
to the shared resource, the key identifying tftie resource 
manager which performed the operation, and therein the 
identification of operations performed by/the failed 
resource manager comprises checking said/ keys for 
unrecovered operations performed in relation to any of said 
set of data storage structures . / 

14. A method according to claim wherein a single unit 
of work represented by a unit oy work descriptor may 
include operations performed in relation to a plurality of 
data storage structures, and/wherein the partial units of 
work corresponding to said >6perations are recovered by 
different ones of said regaining resource managers within 
the group. / 

15. A method for re/overing from failures affecting a 
resource manager within a group of resource managers, 
wherein the resource managers within the group have access 
to a shared resource, the shared resource including data 
storage structures to which resource managers within said 
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group connect to perform operations in relation to ydata 
held in said shared resource, the method comprising: 



storing, within a first data storage structure of the 
shared resource, unit of work descriptors foy operations 
performed by the resource managers in said group in 
relation to data held in said shared resource; 
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sending a notification of a connection failure between 
a second data storage structure of th^ shared resource and 
a first resource manager within said/group, the 
notification being sent to the remarining resource managers 
within the group which are connec/ed to the second data 
storage structure ; 

one or more of said remaning resource managers 
accessing said first data storage structure and analysing 
the unit of work descriptors to identify the units of work 
relating to the second dsfta storage structure that were 
being performed by the yfirst resource manager when the 
connection failure occurred; and 



25 



said one or more remaining resource managers 
recovering the identified units of work. 
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16. A method according to claim 15, wherein the >6ata 
storage structures of said shared resource include data 
storage structures which contain shared message queues and 
said operations performed in relation to sai4 shared 
resource include putting messages onto a shared message 
queue and retrieving messages from a shaded message queue, 
for communication between a remote resource manager and 
resource managers within said group. / 

17. A method according to claim 1&, wherein the unit of 
work descriptors include : / 

a unit of work identifier/ 

an identification of messages put or retrieved within 
the unit of work; / 

a status for the unfit of work; and 

a sequence number. 

18. A method according to claim 16, wherein the operations 
of putting messages onto a shared queue and retrieving 
messages from a/shared queue are performed under 
transactional /scope such that a message which is put is 
only availab/e to resource managers other that the resource 
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manager putting the message after commitment of the/put 
operation and a message which is retrieved is only 
available to the retrieving resource manager after 
commitment of the retrieval operation, and wherein said 
stored unit of work descriptors identify eacly of the 
following : 

units of work that were uncommitted/but for which a 
decision to commit had been made whofi the failure 
occurred; 

units of work that were uncommitted but for which a 
decision to abort had been macje when the failure 
occurred; and 

units of work for which^ no commit or abort decision 
had been made when the failure occurred; 
and wherein recovering ttyk identified units of work 
comprises : 

committing message put and retrieval operations for 
which a decision/to commit had been made; 



backing ouy message put and retrieval operations for 
which a decision to back out had been made; and 
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backing out message put and message retrieval 
operations for which no commit or abort decision had been 
made . / 

19. A distributed data processing system/including: 

a plurality of resource managers; / 

a shared access resource includ/ng data storage 
structures to which the resource managers connect to send 
and receive communications to and /from remote resource 
managers, the shared access resource including: 

means for storing, withiEl a first data storage 
structure of the sharecy resource, unit of work 
descriptors for operations performed in relation to 
said shared resource/by the resource managers in said 
plurality; and / 

means for sending a notification of a connection 
failure between/ a second data storage structure of the 
shared resource and a first resource manager within 
said plurality, the notification being sent to the 
remaining resource managers within the plurality which 



are connected to the second data storage structure; 
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wherein said remaining resource managers include: 

means for accessing said first d^ta storage 
structure and analysing the unit of ywork descriptors 
to identify the units of work relatting to the second 
data storage structure that were Joeing performed by 
the first resource manager when A he connection failure 
occurred; and / 

means for recovering the identified units of 
work . / 

20. A computer program product comprising program code 
recorded on a machine -readable recording medium, the 
program code comprising the/following set of components: 

a plurality of resource managers; 

a shared access resource manager including program 
code for managing storage and retrieval of data within data 
storage structures t<p which the resource managers connect 
to send and receive/communications to and from remote 
resource managers , /the shared access resource manager 
including: / 

means for Storing, within a first data storage 
structure ok the shared resource, unit of work 
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descriptors for operations performed in relation to said 
shared resource by the resource managers in ^said 
plurality; and 



means for sending a notification of d connection 
failure between a second data storaga/ structure of the 
shared resource and a first resources manager within said 
plurality, the notification being/sent to the remaining 
resource managers within the plurality which are 
connected to the second data /torage structure; 
wherein said remaining resource managers include: 



means for accessing ssiid first data storage structure 
and analysing the unit/of work descriptors to identify 
the units of work re/ating to the second data storage 
structure that werer being performed by the first resource 
manager when the vconnection failure occurred; and 



means for recovering the identified units of work. 



